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Comparison of Multi-quark Matrix Inversion Algoritmes 
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We test iterative algorithms, MR, QMR75 and BiCG7s, to compare their efficiency in matrix inversion with 
multi-quarks (shifted matrices) within one iteration process. Our results on the 8 3 x 12 and 16 3 x 24 show that 
MR admits multi-quark calculation with less memory requirement, whereas QMR is faster for the single quark 
calculation. 



1. INTRODUCTION 

To describe the inversion algorithms for multi- 
quarks, we start by giving a general formula for 
shifted matrix, a multiple of the identity plus a 
constant off-diagonal part, 



A(a) =al+A. 



(1) 



The parameter a stands for a whole trajectory. 
In lattice QCD theory the fermion matrix, with 
respect to quark mass pi] 
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m = 5~( ), 

has the shifted structure [j 
to each other by, 

M heavy = f(m) + Mught, 



(2) 

and they are related 
(3) 



where term /(m) = (kJ^ - > 0, M Ught 

and Mi ie avy are fermion matrices for light quark 
(considered as seed system) and heavy quark {ex- 
trapolated system) respectively. 

By numbering all even sites before the odd 
ones, we rewrite Dirac equation Mx — (f> as 
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(4) 



where a = X/k, then separate it into the so-called 
even-odd preconditioned form: 



M e X e = (f> e , X 

M K = <t 2 - D m D m 



n((j) + D oe x e ); (5) 
<j) e = a(j> e + D eo (j) . 



For the smeared source the equation M e x e = 
is further decoupled, by setting x e — cry e + z e , 



M e y e = 4> e 



M e z K = D e 



(6) 



such that the right hands of these equations are 
independent of k. 

An iterative process to solve the nonsingular 
system Ax = b starts from an initial guess x° and 
an initial residual r° = b — Ax° . The nonsymmet- 
ric Lanczos process generates an orthogonal 
basis for the Krylov subspace 

K m (A,r°) H^.^.A ,-,^ }, (7) 

to obtain an approximate solution x m in mth 
step iteratively with short recurrences and to keep 
x m + K m (A, r°). It is essential to notice for 
inversion with multi-quarks that, on the trajec- 
tory of the shifted matrices, their Krylov spaces 
are identical ^,|). 

Two directions to achieve good efficiency, be- 
sides a good preconditioning'!^], are considered 
currently & (a) Acceleration of convergence us- 
ing improved iterative procedures (such as QMR 
and BiCGStab2). (b) Exploitation of struc- 
ture of the matrix M in the inverters (such 
as 75-symmetry and shifted properties). In 
this paper we attempt to test Minimal Residual 
(MR), Quasi-Minimal Residual (QMR) and Bi- 
Conjugate Gradient (BiCG), exploiting the 75- 
symmetry and using the shifted feature for inver- 
sion with multi-quarks 0-^]. For definiteness we 
consider only 5-sources and solve one of the ex- 
pressions in eq. 6 as y e — or z e = 0. 
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Figure 1. The relative residuals versus iteration 
using M 3 R algorithm, on 8 3 x 12 lattice. 



2. ALGORITHM 

First we mention that, after even-odd precon- 
dition, the matrix M e still has 75-symmetry and 
they are shifted matrices for multi-quarks. Our 
numerical computations were done for the even- 
odd preconditioned systems on lattices 8 3 x 12, 
and 16 3 x 24 and for the quenched gauge con- 
figurations at f3 = 6.0 (k c c± 0.157). We tested 
the Multiple-Masses- Minimal Residual (M 3 R) 
method for matrix inversion with multi-quarks 
and then compare the results of convergence rates 
with those obtained by using the QMR and BiCG 
algorithms. 

2.1. M 3 R Algorithm[8] 

To solve (crl + A)x = b, the M 3 R algorithm is 
given by (initial: x° = 0, r° = 6- At , ] a _ x = 1), 

do m = 0, 1, • • • , to convergence 
p m = Ar m 

(p m )t r m 
am = %™)V™ 

fa 

ra _ J m — 1 

Im = + aa m ) 

x m+l = x m + a m r m 

%cr = X cr "F a mf m r 

r m+1 = r m - a m Ar m 
end do 



where x m and r m are the mth approximate so- 
lution and residual respectively for the seed sys- 
tem, is the mth approximate solution for one 
of the extrapolated systems and coefficient is 
iterated step by step for each quark mass. It is 
necessary to take x° = for seed system to keep 
all initial residuals r° to be the same for different 
quark masses. As shown by the algorithm, the 
matrix-vector multiplication performs only once 
in the whole set {a} at each iterative step. For 
each additional quark mass, the price to pay is 
one vector x^ to be stored and a little CPU times 
(about 8% for scalar products), with no addi- 
tional matrix multiplication performed. We take 
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Figure 2. The same as Figure 1. but on 16 3 x 24 
lattice. 



the system at k = 0.156 as a seed and extrapolate 
to heavier quarks at k = 0.155, 0.154, 0.152, 0.150 
and 0.148, as shown in Fig. 1 and Fig. 2. The 
results give evidence that the gain factor is about 
2 by using the M 3 R for 5 extrapolated quarks as 
compared to calculating the 5 quarks separately 
for 5 sources. The overrelaxation parameter oj is 
chosen to be uo = 1.1 (Fig. 3) for best conver- 
gence rate. In these plots the relative residual is 
defined by || b - Ax m || / || r° ||. The stopping 
criteria for convergence is || r m \\< 10~ 9 . 
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Figure 3. The convergence rate in MR for differ- 
ent values of uj on 16 3 x 24 lattice. 



Figure 4. The relative residuals versus iteration 
by using the QMR algorithm, on 16 3 x 24 lattice. 



2.2. QMR75 without look-ahead 

The Quasi-Minimal Residual exploiting the 75- 
symmetry is described in ref. ||. To solve eq. 6 
for shifted matrix (see eq. 1), it performs: 



do m 



1, 2, ■ • ■ , to convergence 



{I. do Lanczos step} 

= (75^™)^™ 

— p m S m I S m — 1 

= Av m - (a m - a)v m - m v m - 1 

= II r m || 

= r m+1 /p m+1 

for QMR recurrence coefficients} 

— * {d,e,c m+ i, S m+ i,Xm+l} 
{III do QMR iterations} 

K - e V m - x - 9p m - 2 )/ Xm+ i 

&m-\-lpm 1 

■<- ' x m + p mP m , 

Sm-\-lpm 



p m 

r m+l 

Pm+1 
v m+l 

{II. 

1 ; ftm / 



p 



m+l 



Pm+1 — 

end do 

In steps II and III, there is no matrix multi- 
plication in the mth approximation. It is obvi- 
ous that, to solve the Dirac equation with multi- 
quarks, the matrix multiplication is carried out 
only at a = point for the whole set of er. 



Due to the 75-symmetry of M e , 75Mj'75 = M e , 
the computational effort at each Lanczos step re- 
duces to one matrix multiplication, instead of 
two (for M e and Mj each), and the coefficients 
are all real. The price to pay is three vectors 
{x m (a),p m (a),p m ~ 1 (o-)} to be stored for each 
additional quark. Fig. 4 gives the results of 
the relative residuals versus the iteration steps at 
several k's respectively. The plot of Fig. 5 shows 
that QMR75 is faster than MR in convergency 
at k — 0.156. But this feature could be reduced 
by the GMRES(4), which can save 30% iterations 
compared with MR, but at the expense of 3 more 
vectors in memory Q. 

2.3. BiCG 75 Algorithm 

BiCG method^] exploiting the 75-symmetry is 

■ , to convergence 



do m 


= 0,1,-- 




= (J 5 r m 


x m+l 






= x m + 


r m+l 






= r m -t 




= (J 5 r m 


p m+l 


= r m+l 


end do 





u mP 

5 m A(a) 



V( 75 r m )tr™ 



This two-term recurrence method has difficulty 
in memory capacity for multi-quarks: the coeffi- 
cients in the mth step for different values of a 
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Figure 5. The curves of convergency at k = 0.156 
using MR and QMR75 algorithms, from up to 
low, on Lattice I: 16 3 x 24 and II: 8 3 x 12. 



Figure 6. The relative residuals versus iteration 
by using the BiCG algorithm, on 16 3 x 24 lattice. 



can not be obtained from those for a = by 
short recurrences. In addition, this algorithm also 
shows large fluctuations in the relative residual 
(Fig. 6) which can be eliminated by the variant 
BiCGStab|| algorithm. 

3. CONCLUSION 

For problems involving the inversion of multi- 
quark matrices, we find M 3 R or GMRES to be 
a good compromise if the memory is limited. It 
requires only one more vector for each additional 
quark and the overhead in CPU time is minimal 
(~ 8%). On the other hand, QMR is faster than 
M R. However, it requires memory of 3 vectors 
for each additional quark and a look-ahead algo- 
rithm to avoid the breakdown pf|. BiCG75 does 
not admit multi-quark implementation with short 
recurrences and the relative residual fluctuates in 
a large range. 
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